Skip to content

feat(sdk-core): add SdkWarmUp.prime() for CRaC auto-priming#7056

Merged
joviegas merged 4 commits into
feature/master/crac_auto_priming_supportfrom
joviegas/crac_warmup_orchestration_partA
Jun 23, 2026
Merged

feat(sdk-core): add SdkWarmUp.prime() for CRaC auto-priming#7056
joviegas merged 4 commits into
feature/master/crac_auto_priming_supportfrom
joviegas/crac_warmup_orchestration_partA

Conversation

@joviegas

@joviegas joviegas commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Motivation and Context

Next building block of CRaC (Coordinated Restore at Checkpoint) auto-priming: the warm-up entry point that drives priming before a checkpoint. Builds on the previously added SdkWarmUpProvider SPI. Follows the same ServiceLoader discovery pattern the SDK already uses for SdkHttpService (the HTTP-client loader).

Modifications

  • Add SdkWarmUp in software.amazon.awssdk.core.crac: a @SdkPublicApi final utility with a single static prime(). It discovers every SdkWarmUpProvider on the classpath via ServiceLoader and invokes warmUp() on each. Modeled on the HTTP-client ServiceLoader loader pattern.
  • Runs at most once per JVM (idempotent), thread-safe via an AtomicBoolean run-once guard.
  • Per-provider failure containment: a provider that throws or fails to load is logged and skipped so the others still run. Empty classpath is a safe no-op.
  • Add internal loader plumbing in software.amazon.awssdk.core.internal.crac (WarmUpInvoker + ClasspathWarmUpInvoker + WarmUpServiceLoader), splitting the discover-and-invoke logic from the static entry point so it stays testable.

License

  • I confirm that this pull request can be released under the Apache 2 license

License

  • I confirm that this pull request can be released under the Apache 2 license

@joviegas joviegas requested a review from a team as a code owner June 19, 2026 17:59
return;
}

ClasspathWarmUpInvoker.create().invokeAll();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails (raises some error), should we reset the PRIMED?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out.
Updated the logic to set primed only after invokeAll() succeeds , also made this part threadsafe.

}

invokedAny = true;
try {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have two seperate try catch blocks here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each block catches a different failure.

The first block wraps iterator.next(), which is where a provider is located and instantiated. Per the ServiceLoader javadoc, this can throw ServiceConfigurationError if a provider-configuration file violates the specified format, if it names a provider class that cannot be found and instantiated, or if the result is not assignable to the service type. We catch that, log it, and continue with the remaining providers, so one bad provider does not abort the rest. Wrapping next() alone, not in a wider RuntimeException catch, keeps any other failure from next() visible.

The second block wraps provider.warmUp() and catches RuntimeException thrown by the provider's own code. We contain that one provider and still run the rest.

We intentionally contain failures from next() and warmUp() only. hasNext() stays in the loop condition, so if it throws we let it propagate rather than swallow it.

return ServiceLoader.load(SdkWarmUpProvider.class, classLoader).iterator();
}

private static final class CountingProvider implements SdkWarmUpProvider {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about why we have effectively 3 test implementations that do the same thing (CoutningProvider, RealProvider and CountingWarmUpProvider)? And why does this one use a private invocation counter vs the other ones public constant?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These got added as I built out the test scenarios. You're right that two of them did the same thing, so I removed the duplicate (RealProvider) and reused the existing one. I also renamed the remaining two to say how each is supplied:

  • CountingWarmUpProvider(old) -> RegisteredWarmUpProvider(new): registered in META-INF/services and discovered by a real ServiceLoader (so it's public with a no-arg ctor). Counter is static because ServiceLoader builds the instance and the test can't hold it, and public so ClasspathWarmUpInvokerTest (a different package) can read it where it test for a failed and succesful Warmuploader.
  • CountingProvider: a local stub handed straight to the invoker (no ServiceLoader), so it keeps a private instance counter.

@joviegas joviegas merged commit 98d7f72 into feature/master/crac_auto_priming_support Jun 23, 2026
3 of 4 checks passed
@github-actions

Copy link
Copy Markdown

This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jun 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants